Forecasting: Principles and Practice Chapter 7 - Time Series Regression Models

R. J. Serrano

03/02/2023

Time Series features —

Learning objectives:

Introduction —

In this chapter, we will discuss regression models. The basic concept is that we forecast the time series of interest \(y\) assuming that is has a linear relationship with other time series \(x\).

7.1 - The linear model —

In the simplest case, the regression model allows for a linear relationship between the forecast variable \(y\) and a single predictor \(x\):

Figure 7.1: An example of data from a simple linear regression model

Example: US consumption expenditure

Let’s plot the time series of quarterly changes (growth rates) of real personal consumption expenditure (\(y\)) and real personal disposable income (\(x\)) for the U.S. from 1970 Q1 and 2019 Q2.

data("us_change")
us_change

us_change %>% 
     pivot_longer(c(Consumption, Income), 
                  names_to = 'Series') %>% 
     autoplot(value) + 
     labs(y = '% change')

Using the lm base function to calculate intercept and slope coefficients.

us_chg_tbl <- us_change %>% 
     as_tibble() %>% 
     select(Consumption, Income)

model_lm <- lm(Consumption ~ Income, data = us_chg_tbl)
summary(model_lm)
## 
## Call:
## lm(formula = Consumption ~ Income, data = us_chg_tbl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.58236 -0.27777  0.01862  0.32330  1.42229 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.54454    0.05403  10.079  < 2e-16 ***
## Income       0.27183    0.04673   5.817  2.4e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5905 on 196 degrees of freedom
## Multiple R-squared:  0.1472, Adjusted R-squared:  0.1429 
## F-statistic: 33.84 on 1 and 196 DF,  p-value: 2.402e-08

Scatterplot with fitted regression line.

us_change |>
     ggplot(aes(x = Income, y = Consumption)) +
     labs(y = "Consumption (quarterly % change)",
          x = "Income (quarterly % change)") +
     geom_point() +
     geom_smooth(method = "lm", se = FALSE)

Cross-correlation between the two time series.

# base R
ccf(us_chg_tbl$Consumption, us_chg_tbl$Income)

us_change %>% 
     CCF(Consumption, Income) %>% 
     autoplot()

The cross correlation plot can be used to determine lags as useful predictors.

Mutiple regression

us_change |> 
     
     # drop `Consumption`, `Income` columns
     select(-Consumption, -Income) |>
     pivot_longer(-Quarter) |>
     ggplot(aes(Quarter, value, colour = name)) +
     geom_line() +
     facet_grid(name ~ ., scales = "free_y") +
     guides(colour = "none") +
     labs(y="% change")

Pair plots for us_change

library(GGally)
us_change %>% 
     ggpairs(columns = 2:6, 
             lower = list(
                  continuous = wrap('smooth', 
                                    color = 'steelblue', 
                                    alpha = 0.3, 
                                    size = 0.3)
                  )
             )

Assumptions about the errors (residuals)

7.2 - Least squares estimation —

The least squares principle provides a way of choosing the coefficients effectively by minimising the sum of the squared errors.

Example: US consupmtion expenditure

fit_consMR <- us_change |>
     model(tslm = TSLM(Consumption ~ Income + Production +
                                    Unemployment + Savings))
report(fit_consMR)
## Series: Consumption 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90555 -0.15821 -0.03608  0.13618  1.15471 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.253105   0.034470   7.343 5.71e-12 ***
## Income        0.740583   0.040115  18.461  < 2e-16 ***
## Production    0.047173   0.023142   2.038   0.0429 *  
## Unemployment -0.174685   0.095511  -1.829   0.0689 .  
## Savings      -0.052890   0.002924 -18.088  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3102 on 193 degrees of freedom
## Multiple R-squared: 0.7683,  Adjusted R-squared: 0.7635
## F-statistic:   160 on 4 and 193 DF, p-value: < 2.22e-16

Fitted values

augment(fit_consMR) |>
     ggplot(aes(x = Quarter)) +
     geom_line(aes(y = Consumption, colour = "Data")) +
     geom_line(aes(y = .fitted, colour = "Fitted")) +
     labs(y = NULL,
          title = "Percent change in US consumption expenditure"
     ) +
     scale_colour_manual(values=c(Data="black",Fitted="#D55E00")) +
     guides(colour = guide_legend(title = NULL))

augment(fit_consMR) |>
     ggplot(aes(x = Consumption, y = .fitted)) +
     geom_point() +
     labs(
          y = "Fitted (predicted values)",
          x = "Data (actual values)",
          title = "Percent change in US consumption expenditure"
     ) +
     geom_abline(intercept = 0, slope = 1, color = 'red')

Goodness-of-fit

A common way to summarise how well a linear regression model fits the data is via the coefficient of determination, or \(R^{2}\). For the example above, the \(R^{2}\) = 0.768.

7.3 - Evaluating the regression model

Residual plots

Using the gg_tsresiduals() function introduced in Section 5.3, we can obtain all the useful residual diagnostics mentioned above.

fit_consMR %>% 
     gg_tsresiduals()

augment(fit_consMR) |> 
     features(.innov, ljung_box, lag = 10)

The time plot shows some changing variation over time, but is otherwise relatively unremarkable. This heteroscedasticity will potentially make the prediction interval coverage inaccurate.

The histogram shows that the residuals seem to be slightly skewed, which may also affect the coverage probability of the prediction intervals.

The autocorrelation plot shows a significant spike at lag 7, and a significant Ljung-Box test at the 5% level. However, the autocorrelation is not particularly large, and at lag 7 it is unlikely to have any noticeable impact on the forecasts or the prediction intervals.

Residual plot against predictors

The residuals from the multiple regression model for forecasting US consumption plotted against each predictor in Figure 7.9 seem to be randomly scattered. Therefore we are satisfied with these in this case.

us_change |>
     left_join(residuals(fit_consMR), by = "Quarter") |>
     pivot_longer(Income:Unemployment,
                  names_to = "regressor", values_to = "x") |>
     ggplot(aes(x = x, y = .resid)) +
     geom_point() +
     facet_wrap(. ~ regressor, scales = "free_x") +
     labs(y = "Residuals", x = "")

Residual plots against fitted values

A plot of the residuals against the fitted values should also show no pattern. If a pattern is observed, there may be “heteroscedasticity” in the errors which means that the variance of the residuals may not be constant.

augment(fit_consMR) |>
     ggplot(aes(x = .fitted, y = .resid)) +
     geom_point() + labs(x = "Fitted", y = "Residuals")

Outliers and influential observations

Observations that take extreme values compared to the majority of the data are called outliers. Observations that have a large influence on the estimated coefficients of a regression model are called influential observations. Usually, influential observations are also outliers that are extreme in the \(x\) direction.

Spurious regressions

Spurious correlations

7.4 - Evaluating the regression model

Example: Australian quartely beer production

recent_production <- aus_production |> 
     filter(year(Quarter) >= 1992)
recent_production |>
     autoplot(Beer) + 
     geom_smooth(method = 'loess', se = FALSE, color = 'steelblue') + 
     labs(y = "Megalitres",
          title = "Australian quarterly beer production")

We want to forecast the value of future beer production. We can model this data using a regression model with a linear trend and quarterly dummy variable.

TSLM - Fit a linear model with time series components

fit_beer <- recent_production |> 
     model(TSLM(Beer ~ trend() + season()))

report(fit_beer)
## Series: Beer 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -42.9029  -7.5995  -0.4594   7.9908  21.7895 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   441.80044    3.73353 118.333  < 2e-16 ***
## trend()        -0.34027    0.06657  -5.111 2.73e-06 ***
## season()year2 -34.65973    3.96832  -8.734 9.10e-13 ***
## season()year3 -17.82164    4.02249  -4.430 3.45e-05 ***
## season()year4  72.79641    4.02305  18.095  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.23 on 69 degrees of freedom
## Multiple R-squared: 0.9243,  Adjusted R-squared: 0.9199
## F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16

Note that trend() and season() are not standard functions; they are “special” functions that work within the TSLM() model formulae.

Visualize observed and predicted values (augment)

augment(fit_beer) |>
     ggplot(aes(x = Quarter)) +
     geom_line(aes(y = Beer, colour = "Data")) +
     geom_line(aes(y = .fitted, colour = "Fitted")) +
     scale_colour_manual(
          values = c(Data = "black", Fitted = "#D55E00")
     ) +
     labs(y = "Megalitres",
          title = "Australian quarterly beer production") +
     guides(colour = guide_legend(title = "Series"))

Plotly version

g <- augment(fit_beer) %>% 
     select(Quarter, Beer, .fitted) %>% 
     as_tibble() %>% 
     pivot_longer(-Quarter) %>% 
     ggplot(aes(Quarter, value, group = name, color = name)) + 
     geom_line(linewidth = 0.5) + 
     labs(x = NULL, 
          y = "Megalitres",
          title = "Australian quarterly beer production", 
          color = 'Series')

ggplotly(g)

Visualize observed and predicted values by quarter

augment(fit_beer) |>
     ggplot(aes(x = Beer, y = .fitted,
                colour = factor(quarter(Quarter)))) +
     geom_point() +
     labs(y = "Fitted", x = "Actual values",
          title = "Australian quarterly beer production") +
     geom_abline(intercept = 0, slope = 1) +
     guides(colour = guide_legend(title = "Quarter"))

Additional predictors

Fourier series

An alternative to using seasonal dummy variables, especially for long seasonal periods, is to use Fourier terms.

These Fourier terms are produced using the fourier() function. For example, the Australian beer data can be modelled like this.

fourier_beer <- recent_production |> 
     model(TSLM(Beer ~ trend() + fourier(K = 2)))

report(fourier_beer)
## Series: Beer 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -42.9029  -7.5995  -0.4594   7.9908  21.7895 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        446.87920    2.87321 155.533  < 2e-16 ***
## trend()             -0.34027    0.06657  -5.111 2.73e-06 ***
## fourier(K = 2)C1_4   8.91082    2.01125   4.430 3.45e-05 ***
## fourier(K = 2)S1_4 -53.72807    2.01125 -26.714  < 2e-16 ***
## fourier(K = 2)C2_4 -13.98958    1.42256  -9.834 9.26e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.23 on 69 degrees of freedom
## Multiple R-squared: 0.9243,  Adjusted R-squared: 0.9199
## F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16

The K argument to fourier() specifies how many pairs of sin and cos terms to include. The maximum allowed is \(K = m / 2\) where m is the seasonal period. Because we have used the maximum here, the results are identical to those obtained when using seasonal dummy variables.

7.5 - Selecting predictors (Feature selection)

A common approach that is not recommended is to plot the forecast variable against a particular predictor and if there is no noticeable relationship, drop that predictor from the model. This is invalid because it is not always possible to see the relationship from a scatterplot, especially when the effects of other predictors have not been accounted for.

Another common approach which is also invalid is to do a multiple linear regression on all the predictors and disregard all variables whose \(p\)-values are greater than 0.05. To start with, statistical significance does not always indicate predictive value. Even if forecasting is not the goal, this is not a good strategy because the \(p\)-values can be misleading when two or more predictors are correlated with each other (see Section 7.8).

Instead, we will use a measure of predictive accuracy. Five such measures are introduced in this section. They can be shown using the glance() function, here applied to the model for US consumption:

glance(fit_consMR) |> 
     select(adj_r_squared, CV, AIC, AICc, BIC)

We compare these values against the corresponding values from other models. For the CV, AIC, AICc and BIC measures, we want to find the model with the lowest value; for Adjusted \(R^2\), we seek the model with the highest value.

Predictive Accuracy Measures Comparison Chart

Example: US Consumption

Fig. 7.1

7.6 - Forecasting with regression

Ex-ante forecasts are those that are made using only the information that is available in advance.

For example, ex-ante forecasts for the percentage change in US consumption for quarters following the end of the sample, should only use information that was available up to and including 2019 Q2.

In order to generate ex-ante forecasts, the model requires forecasts of the predictors.

Ex-post forecasts are those that are made using later information on the predictors.

For example, ex-post forecasts of consumption may use the actual observations of the predictors, once these have been observed.

These are not genuine forecasts, but are useful for studying the behaviour of forecasting models.

Example: Australia quarterly beer production

Normally, we cannot use actual future values of the predictor variables when producing ex-ante forecasts because their values will not be known in advance.

However, the special predictors introduced in Section 7.4 are all known in advance, as they are based on calendar variables (e.g., seasonal dummy variables or public holiday indicators) or deterministic functions of time (e.g. time trend).

In such cases, there is no difference between ex-ante and ex-post forecasts.

recent_production <- aus_production |> 
     filter(year(Quarter) >= 1992)

fit_beer <- recent_production |> 
     model(TSLM(Beer ~ trend() + season()))

fc_beer <- forecast(fit_beer)

fc_beer |>
     autoplot(recent_production) +
     labs(
          title = "Forecasts of beer production using regression",
          y = "megalitres"
     )

Scenario based forecasting

In this setting, the forecaster assumes possible scenarios for the predictor variables that are of interest.

For example, a US policy maker may be interested in comparing the predicted change in consumption when there is a constant growth of 1% and 0.5% respectively for income and savings with no change in the employment rate, versus a respective decline of 1% and 0.5%, for each of the four quarters following the end of the sample.

fit_consBest <- us_change |> 
     model(
          lm = TSLM(Consumption ~ Income + Savings + Unemployment)
     )
future_scenarios <- scenarios(
     Increase = new_data(us_change, 4) |>
          mutate(Income=1, Savings=0.5, Unemployment=0),
     Decrease = new_data(us_change, 4) |>
          mutate(Income=-1, Savings=-0.5, Unemployment=0),
     names_to = "Scenario")

fc <- forecast(fit_consBest, new_data = future_scenarios)

us_change |>
     autoplot(Consumption) +
     autolayer(fc) +
     labs(title = "US consumption", y = "% change")

Building a predictive regression model

The great advantage of regression models is that they can be used to capture important relationships between the forecast variable of interest and the predictor variables.

However, for ex ante forecasts, these models require future values of each predictor, which can be challenging.

If forecasting each predictor is too difficult, we may use scenario-based forecasting instead, where we assume specific future values for all predictors.

Example

fit_cons <- us_change |>
     model(TSLM(Consumption ~ Income))

new_cons <- scenarios(
     "Average increase" = new_data(us_change, 4) |>
          mutate(Income = mean(us_change$Income)),
     "Extreme increase" = new_data(us_change, 4) |>
          mutate(Income = 12),
     names_to = "Scenario"
)

fcast <- forecast(fit_cons, new_cons)

us_change |>
     autoplot(Consumption) +
     autolayer(fcast) +
     labs(title = "US consumption", y = "% change")

Nonlinear regression

Although the linear relationship assumed so far in this chapter is often adequate, there are many cases in which a nonlinear functional form is more suitable. To keep things simple in this section we assume that we only have one predictor \(x\).

The simplest way of modelling a nonlinear relationship is to transform the forecast variable \(y\) and/or the predictor variable \(x\) before estimating a regression model. While this provides a non-linear functional form, the model is still linear in the parameters. The most commonly used transformation is the (natural) logarithm (see Section 3.1).

Forecasting a nonlinear trend

Example: Boston marathon winning times

boston_men <- boston_marathon |>
     filter(Year >= 1924) |>
     filter(Event == "Men's open division") |>
     mutate(Minutes = as.numeric(Time)/60)

boston_men

Let’s plot Year and Minutes with a fitted trend line (‘lm’)

boston_men %>% 
     ggplot(aes(Year, Minutes)) + 
     geom_line(linewidth = 1) + 
     geom_smooth(method = 'lm', se = FALSE, color = 'steelblue') + 
     labs(x = 'Year', 
          y = 'Minutes', 
          title = 'Boston marathon winning times')

Now, let’s create the same plot, but with a ‘loess’ smoother.

boston_men %>% 
     ggplot(aes(Year, Minutes)) + 
     geom_line(linewidth = 1) + 
     geom_smooth(method = 'loess', se = FALSE, color = 'steelblue') + 
     labs(x = 'Year', 
          y = 'Minutes', 
          title = 'Boston marathon winning times')

It is clear that the linear pattern is not a good fit for the observed data after 1980 (previous trend is not extended after 1980).

Residual analysis from the linear model

fit_boston <- boston_men %>% 
     model(TSLM(Minutes ~ Year))

fit_boston %>% 
     gg_tsresiduals()

Applying different models to the time series

fit_trends <- boston_men |>
     model(
          linear = TSLM(Minutes ~ trend()),
          exponential = TSLM(log(Minutes) ~ trend()),
          piecewise = TSLM(Minutes ~ trend(knots = c(1950, 1980)))
     )

fc_trends <- fit_trends |> forecast(h = 10)

boston_men |>
     autoplot(Minutes) +
     geom_line(data = fitted(fit_trends),
               aes(y = .fitted, colour = .model)) +
     autolayer(fc_trends, alpha = 0.5, level = 95) +
     labs(y = "Minutes",
          title = "Boston marathon winning times")

The best forecasts appear to come from the piecewise linear trend.

7.8 Correlation, causation and forecasting

Correlation is not causation

It is important not to confuse correlation with causation, or causation with forecasting. A variable \(x\) may be useful for forecasting a variable \(y\), but that does not mean \(x\) is causing \(y\). It is possible that \(x\) is causing \(y\), but it may be that \(y\) is causing \(x\), or that the relationship between them is more complicated than simple causality.

We describe a variable that is not included in our forecasting model as a confounder when it influences both the response variable and at least one predictor variable. Confounding makes it difficult to determine what variables are causing changes in other variables, but it does not necessarily make forecasting more difficult.

Forecasting with correlated predictors

When two or more predictors are highly correlated it is always challenging to accurately separate their individual effects.

Having correlated predictors is not really a problem for forecasting, as we can still compute forecasts without needing to separate out the effects of the predictors.

However, it becomes a problem with scenario forecasting as the scenarios should take account of the relationships between predictors. It is also a problem if some historical analysis of the contributions of various predictors is required.

Multicollinearity and forecasting

A closely related issue is multicollinearity, which occurs when similar information is provided by two or more of the predictor variables in a multiple regression.

It can occur when two predictors are highly correlated with each other (that is, they have a correlation coefficient close to +1 or -1).

In this case, knowing the value of one of the variables tells you a lot about the value of the other variable.

Hence, they are providing similar information. For example, foot size can be used to predict height, but including the size of both left and right feet in the same model is not going to make the forecasts any better, although it won’t make them worse either.